Picture for Qi Fan

Qi Fan

Geometry-Aware Implicit Memory for Video World Models

Add code
Jun 01, 2026
Viaarxiv icon

Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation

Add code
May 19, 2026
Viaarxiv icon

CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models

Add code
Apr 14, 2026
Viaarxiv icon

Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning

Add code
Apr 08, 2026
Viaarxiv icon

VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning

Add code
Mar 26, 2026
Viaarxiv icon

Prompt-Free Universal Region Proposal Network

Add code
Mar 18, 2026
Viaarxiv icon

PointAlign: Feature-Level Alignment Regularization for 3D Vision-Language Models

Add code
Feb 28, 2026
Viaarxiv icon

DreamWorld: Unified World Modeling in Video Generation

Add code
Feb 28, 2026
Viaarxiv icon

Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning

Add code
Feb 27, 2026
Viaarxiv icon

Pathwise Test-Time Correction for Autoregressive Long Video Generation

Add code
Feb 05, 2026
Viaarxiv icon